Please note that authorship is alphabetical. Contributions are listed below - see github for details and who to blame for what :-).
Ben Anderson (b.anderson@soton.ac.uk @dataknut)
If you wish to refer to any of the material from this report please cite as:
Report circulation:
Report purpose:
official New Zealand Water Quality dataThis work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 700386 (SPATIALEC).
This work is (c) 2019 the University of Southampton.
Data source: https://data.mfe.govt.nz/data/category/fresh-water/
Data file: mfe-river-water-quality-trends-20082017-19982017-and-19902017-CSV/river-water-quality-trends-20082017-19982017-and-1990-2017-raw-data.csv
df <- paste0(dPath, dataFile)
wqdt <- data.table::fread(df)
Overall there are:
This isn’t really trend or raw data - it is a single record per site with indicator of change via a trend slope coefficient or a single % change value. We were looking for the data used to create the time-series plots in (for example) https://www.lawa.org.nz/explore-data/otago-region/river-quality/manuherikia-river/manuherikia-at-ophir/
There are also data quality & missing data issues (see Statistical Annex). It’s noticeable for example that, irrespective of the indicator, over half of stations were thought to have insufficient data to determine trend direction. This was true of 70% of stations for ECOLI and 78% for the Macroinvertebrate Community Index (MCI).
tq <- table(wqdt$TrendCategory, wqdt$npID)
kableExtra::kable(addmargins(tq), caption = "Trend category counts") %>% kable_styling()
| CLAR | DRP | ECOLI | MCI | NH4N | NO3N | TN | TP | TURB | Sum | |
|---|---|---|---|---|---|---|---|---|---|---|
| Decreasing | 82 | 355 | 167 | 132 | 428 | 318 | 204 | 415 | 137 | 2238 |
| Increasing | 259 | 245 | 118 | 79 | 79 | 266 | 211 | 91 | 173 | 1521 |
| Insufficient Data | 426 | 630 | 642 | 764 | 713 | 593 | 499 | 578 | 567 | 5412 |
| Sum | 767 | 1230 | 927 | 975 | 1220 | 1177 | 914 | 1084 | 877 | 9171 |
kableExtra::kable(100 * prop.table(tq, 2), digits = 2, caption = "Trend category %") %>%
kable_styling()
| CLAR | DRP | ECOLI | MCI | NH4N | NO3N | TN | TP | TURB | |
|---|---|---|---|---|---|---|---|---|---|
| Decreasing | 10.69 | 28.86 | 18.02 | 13.54 | 35.08 | 27.02 | 22.32 | 38.28 | 15.62 |
| Increasing | 33.77 | 19.92 | 12.73 | 8.10 | 6.48 | 22.60 | 23.09 | 8.39 | 19.73 |
| Insufficient Data | 55.54 | 51.22 | 69.26 | 78.36 | 58.44 | 50.38 | 54.60 | 53.32 | 64.65 |
plotDT <- wqdt[, .(meanAnnualSenSlope = mean(AnnualSenSlope, na.rm = TRUE)),
keyby = .(npID, TrendCategory)]
ggplot2::ggplot(plotDT, aes(x = TrendCategory, y = meanAnnualSenSlope, group = npID,
fill = npID)) + geom_col(position = "dodge")
Figure 2.1: Test data values by indicator
plotDT <- wqdt[, .(meanPercent_annual_change = mean(Percent_annual_change, na.rm = TRUE)),
keyby = .(npID, TrendCategory)]
ggplot2::ggplot(plotDT, aes(x = TrendCategory, y = meanPercent_annual_change,
group = npID, fill = npID)) + geom_col(position = "dodge")
Figure 2.2: Test data values by indicator
Now try to plot individual trends. This is nasty…
Ideally we want to sort by value within pollutant and ‘fade out’ the ones with insufficient data. Colour coding needs to be careful - for some such as MCI an increase is ‘good’.
p <- ggplot2::ggplot(wqdt, aes(y = sID,
x = Percent_annual_change,
alpha = forcats::fct_rev(TrendCategory), # reverse so lightest alpha == least data
colour = forcats::fct_rev(TrendCategory)
)
) +
geom_vline(xintercept = 0, colour = "red") +
geom_point() +
guides(alpha = guide_legend(title = "Trend and quality code")
) +
guides(colour = guide_legend(title = "Trend and quality code")
) +
scale_color_viridis(discrete = TRUE) +
theme(legend.position = "bottom") +
facet_grid(. ~ npID)
p
Figure 2.3: Site level trend plots
So this is the trend data we were after…
Also downloadable from:
f <- "LAWA Rivers_DownloadWQRawData_Oct2018.csv"
p <- "~/Data/NZ_LAWA/waterQuality/"
df <- paste0(p, f)
lawadt <- data.table::fread(df)
Using:
names(lawadt)
## [1] "LawaSiteID" "Site" "parameter" "Date" "RawValue"
## [6] "Symbol" "Value" "Region" "Agency" "License"
## [11] "Ownership" "Disclaimer"
lawadt$Ownership <- NULL # not used
lawadt$License <- NULL # not used
lawadt$Disclaimer <- NULL # not used
lawadt[, `:=`(RawValue, as.numeric(RawValue))]
lawadt[, `:=`(region.site, paste0(Region, ".", Site))]
lawadt[, `:=`(ba_date, lubridate::dmy(Date))]
h <- head(lawadt)
kableExtra::kable(h, caption = "First 6 rows of LAWA data") %>% kable_styling()
| LawaSiteID | Site | parameter | Date | RawValue | Symbol | Value | Region | Agency | region.site | ba_date |
|---|---|---|---|---|---|---|---|---|---|---|
| ARC-00037 | Papakura @ Alfriston/Ardmore Rd | DRP | 1-Aug-17 | 0.012 | 0.012 | Auckland | AC | Auckland.Papakura @ Alfriston/Ardmore Rd | 2017-08-01 | |
| ARC-00037 | Papakura @ Alfriston/Ardmore Rd | NH4 | 1-Aug-17 | 0.040 | 0.040 | Auckland | AC | Auckland.Papakura @ Alfriston/Ardmore Rd | 2017-08-01 | |
| ARC-00037 | Papakura @ Alfriston/Ardmore Rd | TP | 1-Aug-17 | 0.052 | 0.052 | Auckland | AC | Auckland.Papakura @ Alfriston/Ardmore Rd | 2017-08-01 | |
| ARC-00037 | Papakura @ Alfriston/Ardmore Rd | TON | 1-Aug-17 | 0.800 | 0.800 | Auckland | AC | Auckland.Papakura @ Alfriston/Ardmore Rd | 2017-08-01 | |
| ARC-00037 | Papakura @ Alfriston/Ardmore Rd | TN | 1-Aug-17 | 1.160 | 1.160 | Auckland | AC | Auckland.Papakura @ Alfriston/Ardmore Rd | 2017-08-01 | |
| ARC-00037 | Papakura @ Alfriston/Ardmore Rd | TURB | 1-Aug-17 | 21.000 | 21.000 | Auckland | AC | Auckland.Papakura @ Alfriston/Ardmore Rd | 2017-08-01 |
p <- makeTilePlot(lawadt[parameter == "ECOLI"], yvar = "Value", byvar = "region.site")
p + labs(y = "ECOLI", caption = "Observations for E.Coli") + guides(fill = guide_legend(title = "ECOLI"))
p <- makeTilePlot(lawadt[parameter == "NH4"], yvar = "Value", byvar = "region.site")
p + labs(y = "ECOLI", caption = "Observations for NH4") + guides(fill = guide_legend(title = "NH4"))
st <- summary(lawadt)
kableExtra::kable(st, caption = "Summary of LAWA data") %>% kable_styling()
| LawaSiteID | Site | parameter | Date | RawValue | Symbol | Value | Region | Agency | region.site | ba_date | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Length:696184 | Length:696184 | Length:696184 | Length:696184 | Min. : -21 | Length:696184 | Min. : -21 | Length:696184 | Length:696184 | Length:696184 | Min. :2008-01-03 | |
| Class :character | Class :character | Class :character | Class :character | 1st Qu.: 0 | Class :character | 1st Qu.: 0 | Class :character | Class :character | Class :character | 1st Qu.:2011-02-22 | |
| Mode :character | Mode :character | Mode :character | Mode :character | Median : 1 | Mode :character | Median : 1 | Mode :character | Mode :character | Mode :character | Median :2013-10-09 | |
| NA | NA | NA | NA | Mean : 132 | NA | Mean : 124 | NA | NA | NA | Mean :2013-07-01 | |
| NA | NA | NA | NA | 3rd Qu.: 7 | NA | 3rd Qu.: 7 | NA | NA | NA | 3rd Qu.:2015-12-09 | |
| NA | NA | NA | NA | Max. :11500000 | NA | Max. :11500000 | NA | NA | NA | Max. :2018-07-17 | |
| NA | NA | NA | NA | NA’s :72101 | NA | NA’s :9232 | NA | NA | NA | NA’s :8970 |
# looks like daily data with gaps
p <- makeLinePlot(lawadt[parameter == "ECOLI"], yvar = "Value", byvar = "region.site")
p <- p + labs(y = "E. coli", caption = "E. coli")
guides(colour = guide_legend(title = "E. coli")) + theme(legend.position = "bottom") +
facet_grid(Region ~ .)
## NULL
p
Figure 3.1: Test E.Coli data values by date and site
plotly::ggplotly(p)